After working hard with multivariate, mostly exploratory, even heuristic techniques that are common in data science, the last topic of IODS course will take us back in the task of building statistical models.
The new challenge is that the data will include two types of dependencies simultaneously: In addition to the correlated variables that we have faced with all models and methods so far, the observations of the data will also be correlated with each other.
Lets start this weeks session
Usually, we can assume that the observations are not correlated - instead, they are assumed to be independent of each other. However, in longitudinal settings this assumption seldom holds, because we have multiple observations or measurements of the same individuals. The concept of repeated measures highlights this phenomenon that is actually quite typical in many applications. Both types of dependencies (variables and observations) must be taken into account; otherwise the models will be biased.
To analyze this kind of data sets, we will focus on a single class of methods, called linear mixed effects models that can cope quite nicely with the setting described above.
Before we consider two examples of mixed models, namely the random intercept model and the random intercept and slope model, we will learn how to wrangle longitudinal data in wide form and long form, take a look at some graphical displays of longitudinal data, and try a simple summary measure approach that may sometimes provide a useful first step in these challenges. In passing, we “violently” apply the usual “fixed” models (although we know that they are not the right choice here) in order to compare the results and see the consequences of making invalid assumptions.
Load the packages first !!
#load required packages
library(ggplot2)
library(corrplot)
## corrplot 0.92 loaded
library(tidyverse)
## ── Attaching packages
## ───────────────────────────────────────
## tidyverse 1.3.2 ──
## ✔ tibble 3.1.8 ✔ dplyr 1.0.9
## ✔ tidyr 1.2.0 ✔ stringr 1.4.1
## ✔ readr 2.1.2 ✔ forcats 0.5.2
## ✔ purrr 0.3.4
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
library(GGally)
## Registered S3 method overwritten by 'GGally':
## method from
## +.gg ggplot2
library(dplyr)
library(stringr)
library(psych)
##
## Attaching package: 'psych'
##
## The following objects are masked from 'package:ggplot2':
##
## %+%, alpha
library(FactoMineR)
library(lme4)
## Loading required package: Matrix
##
## Attaching package: 'Matrix'
##
## The following objects are masked from 'package:tidyr':
##
## expand, pack, unpack
That’s a lot of packages !!!
Lets implement the analyses of Chapter 8 of MABS, using the R codes of Exercise Set 6: Meet and Repeat: PART I but using the RATS data (from Chapter 9 and Meet and Repeat: PART II) as instructed in the Moodle.
# read long format data of rats
RATSL <- read.csv('RATSL.csv')
# Lets convert categorical data to factors first
## WE have ID and Group (Just like wrangling exercise)
RATSL$ID <- factor(RATSL$ID)
RATSL$Group <- factor(RATSL$Group)
# glimpse and dimensions
head(RATSL);dim(RATSL)
## ID Group WD Weight Time
## 1 1 1 WD1 240 1
## 2 2 1 WD1 225 1
## 3 3 1 WD1 245 1
## 4 4 1 WD1 260 1
## 5 5 1 WD1 255 1
## 6 6 1 WD1 260 1
## [1] 176 5
str(RATSL)
## 'data.frame': 176 obs. of 5 variables:
## $ ID : Factor w/ 16 levels "1","2","3","4",..: 1 2 3 4 5 6 7 8 9 10 ...
## $ Group : Factor w/ 3 levels "1","2","3": 1 1 1 1 1 1 1 1 2 2 ...
## $ WD : chr "WD1" "WD1" "WD1" "WD1" ...
## $ Weight: int 240 225 245 260 255 260 275 245 410 405 ...
## $ Time : int 1 1 1 1 1 1 1 1 1 1 ...
summary(RATSL)
## ID Group WD Weight Time
## 1 : 11 1:88 Length:176 Min. :225.0 Min. : 1.00
## 2 : 11 2:44 Class :character 1st Qu.:267.0 1st Qu.:15.00
## 3 : 11 3:44 Mode :character Median :344.5 Median :36.00
## 4 : 11 Mean :384.5 Mean :33.55
## 5 : 11 3rd Qu.:511.2 3rd Qu.:50.00
## 6 : 11 Max. :628.0 Max. :64.00
## (Other):110
The data set contains observation from 6 rats and 11 observation of change in weight by Time. They are divided into 3 groups based on treatment. Weight in this case is the outcome variable in this longitudinal study. The idea is to analyse the weight difference in three group over time.
ggplot(RATSL, aes(x = Time, y = Weight, group = ID)) +
geom_line(aes(linetype = Group))+
scale_x_continuous(name = "Time (days)", breaks = seq(0, 60, 10))+
scale_y_continuous(name = "Weight (grams)")+
theme(legend.position = "top")
# Draw the plot ##We plot the Weights of each rat by time and groups
# Rats are divided into three groups
ggplot(RATSL, aes(x = Time, y = Weight, linetype = Group, color = ID)) +
geom_line() +
scale_linetype_manual(values = rep(1:10, times=4)) +
facet_grid(. ~ Group, labeller = label_both) +
theme(legend.position = "none") +
scale_y_continuous(limits = c(min(RATSL$Weight), max(RATSL$Weight)))
From the plot we can see that Group 1 has the most rats with lowest
weight even at starting point (time of recruitment). Group 2 the most
incremental weight outcome compare to baseline but has only 4 rats.
Group 3 has also 4 rats with almost same weight range as Group 2,
however the weight doesn’t seem to increase significantly as group
2.
Higher baseline values means higher values throughout the study.This phenomenon is generally referred to as tracking.
The tracking phenomenon can be seen more clearly in a plot of the standardized values of each observation, i.e.,
\[standardised(x) = \frac{x - mean(x)}{ sd(x)}\]
# Standardize the variable weight by groups
RATSL <- RATSL %>%
group_by(Time) %>%
mutate(Weight_std = (scale(Weight))) %>%
ungroup()
# Glimpse the data
glimpse(RATSL)
## Rows: 176
## Columns: 6
## $ ID <fct> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 1, 2…
## $ Group <fct> 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 1, 1, 1, 1,…
## $ WD <chr> "WD1", "WD1", "WD1", "WD1", "WD1", "WD1", "WD1", "WD1", "WD…
## $ Weight <int> 240, 225, 245, 260, 255, 260, 275, 245, 410, 405, 445, 555,…
## $ Time <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 8, 8, 8, 8,…
## $ Weight_std <dbl[,1]> <matrix[26 x 1]>
head(RATSL)
## # A tibble: 6 × 6
## ID Group WD Weight Time Weight_std[,1]
## <fct> <fct> <chr> <int> <int> <dbl>
## 1 1 1 WD1 240 1 -1.00
## 2 2 1 WD1 225 1 -1.12
## 3 3 1 WD1 245 1 -0.961
## 4 4 1 WD1 260 1 -0.842
## 5 5 1 WD1 255 1 -0.882
## 6 6 1 WD1 260 1 -0.842
# Plot again with the standardized weight in RATSL
ggplot(RATSL, aes(x = Time, y = Weight_std, linetype = Group, color =ID)) +
geom_line() +
scale_linetype_manual(values = rep(1:10, times=4)) +
facet_grid(. ~ Group, labeller = label_both) +
theme(legend.position = "none")
scale_y_continuous(name = "standardized Weight")
## <ScaleContinuousPosition>
## Range:
## Limits: 0 -- 1
The weight difference looks similar now after the standardization,
With large numbers of observations, graphical displays of individual response profiles are of little use and investigators then commonly produce graphs showing average (mean) profiles for each treatment group along with some indication of the variation of the observations at each time point, in this case the standard error of mean
\[se = \frac{sd(x)}{\sqrt{n}}\]
# Summary data with mean and standard error of RATSl Weight by group and time
RATSS <- RATSL %>%
group_by(Group, Time) %>%
summarise(mean = mean(Weight), se = (sd(Weight)/sqrt(length(Weight))) ) %>% #using formula above;
ungroup()
## `summarise()` has grouped output by 'Group'. You can override using the
## `.groups` argument.
# Glimpse the data
glimpse(RATSL)
## Rows: 176
## Columns: 6
## $ ID <fct> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 1, 2…
## $ Group <fct> 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 1, 1, 1, 1,…
## $ WD <chr> "WD1", "WD1", "WD1", "WD1", "WD1", "WD1", "WD1", "WD1", "WD…
## $ Weight <int> 240, 225, 245, 260, 255, 260, 275, 245, 410, 405, 445, 555,…
## $ Time <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 8, 8, 8, 8,…
## $ Weight_std <dbl[,1]> <matrix[26 x 1]>
# Plot the mean profiles
ggplot(RATSS, aes(x = Time, y = mean, color=Group, linetype = Group, shape = Group)) +
geom_line() +
scale_linetype_manual(values = c(1,2,3)) +
geom_point(size=3) +
scale_shape_manual(values = c(1,2,3)) +
geom_errorbar(aes(ymin=mean-se, ymax=mean+se, linetype="1"), width=0.3) +
theme(legend.position = "right") +
scale_y_continuous(name = "mean(Weight) +/- se(Weight)")
From the plot we can see ,All groups are independent and doesn’t seem to
overlap with each other. There is a signifiant difference in Group 1
compared to Group 2 and Group 3. It is also clear that the weight of the
rat seems to increase over time (observation) with a significant
increase in Group 2 and 3.
Using the summary measure approach we will look into the post treatment values of the RATSL data set. Lets look into the mean weight for each rat. The mean of weeks will be our summary measure and we’ll plot boxplots of the mean for each diet group which is our treatment measure.
# Create a summary data by treatment and subject with mean as the summary variable (ignoring baseline week 0)
RATSL8S <- RATSL %>%
filter(Time > 0) %>%
group_by(Group, ID) %>%
summarise(Weight_mean = mean(Weight)) %>%
ungroup()
## `summarise()` has grouped output by 'Group'. You can override using the
## `.groups` argument.
# Glimpse the data
glimpse(RATSL8S)
## Rows: 16
## Columns: 3
## $ Group <fct> 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3
## $ ID <fct> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16
## $ Weight_mean <dbl> 261.0909, 237.6364, 260.1818, 266.5455, 269.4545, 274.7273…
# Draw a boxplot of the mean versus treatment
ggplot(RATSL8S, aes(x = Group, y = Weight_mean, color = Group)) +
geom_boxplot() +
stat_summary(fun = "mean", geom = "point", shape=23, size=4, fill = "white") +
scale_y_continuous(name = "mean(Weight), Days 1-60")
From the box plot, we can see all three groups has outliers. Group 2 has
a large one making the uneven distribution. The next step is to find and
filter the outliers identified above.
# define outlier from group 3
g3 <- RATSL8S %>% filter(Group==3)
out3 <- min(g3$Weight_mean)
# Create a new data by filtering the outliers
RATSL8S2 <- RATSL8S %>%
filter(250 < Weight_mean & Weight_mean < 560 & Weight_mean != out3)
# Draw a boxplot of the mean versus diet
ggplot(RATSL8S2, aes(x = Group, y = Weight_mean, col=Group)) +
geom_boxplot() +
stat_summary(fun = "mean", geom = "point", shape=23, size=4, fill = "white") +
scale_y_continuous(name = "mean(Weight) by days")
### 6.7 T-test and ANOVA
Although the informal graphical material presented up to now has all indicated a lack of difference in the two treatment groups, most investigators would still require a formal test for a difference. Consequently we shall now apply a t-test to assess any difference between the treatment groups, and also calculate a confidence interval for this difference. We use the data without the outlier created above. The t-test confirms the lack of any evidence for a group difference. Also the 95% confidence interval is wide and includes the zero, allowing for similar conclusions to be made. However, T-test only tests for a statistical difference between two groups and in the dataset above we have 3 corresponding groups to be compared, we will therefore use a more stringent and diverse test ANOVA which compares differences among multiple groups. ANOVA assumes homogeniety of variance-the variance in the groups 1-3 should be similar
# Load original wide form rats data
RATSL <- as_tibble(read.table("https://raw.githubusercontent.com/KimmoVehkalahti/MABS/master/Examples/data/rats.txt", header = TRUE, sep = '\t'))
RATSL$ID <- factor(RATSL$ID)
# Add the baseline from the original data as a new variable to the summary data
join_vars <- c("ID","WD1")
RATSL8S3 <- RATSL8S2 %>%
left_join(RATSL[join_vars], by='ID')
# Rename column
RATSL8S3 <- RATSL8S3 %>%
rename('Weight_baseline' = 'WD1')
# Fit the linear model with the mean Weight as the response
fit2 <- lm(Weight_mean ~ Weight_baseline + Group, data = RATSL8S3)
# Compute the analysis of variance table for the fitted model with anova()
anova(fit2)
## Analysis of Variance Table
##
## Response: Weight_mean
## Df Sum Sq Mean Sq F value Pr(>F)
## Weight_baseline 1 174396 174396 7999.137 1.384e-14 ***
## Group 2 1707 853 39.147 3.628e-05 ***
## Residuals 9 196 22
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Looking at the ANOVA table, p-values < 0.05 considering a significance level p = 0.05 at 95% CI. There seem to be significant difference between the groups. The data however doesn’t tell us much about differences between which groups, i.e., multiple comparison. Usually data which follows the normal distribution curve are analysed with ANOVA followed by tukey test for multiple comparison. However in case of data which doesn’t follow the normal distribution curve, Kruskal Wallis followed by Dunn’s test for multiple comparison is conducted. Now assuming our data as normally distributed as we have been doing in this exercise, we can perform a tukey’s test for multiple comparison.
Lets implement Implement the analyses of Chapter 9 of MABS, using the R codes of Exercise Set 6: Meet and Repeat: PART II, but using the BPRS data (from Chapter 8 and Meet and Repeat: PART I) as instructed in the Moodle.
BPRS data includes data pertaining to a brief psychiatric rating scale (BPRS) score prior to treatment and BPRS from 8 weeks during treatment. The patients (n=40) have been randomly assigned to treatment arm 1 or 2 and we are interested whether there is a difference in BPRS scores depending on the received treatment. A lower score means less symptoms.
Lets load and explore the data first
BPRSL <- read.table("BPRSL.csv", header = T, sep=",")
# Factor treatment & subject
BPRSL$treatment <- factor(BPRSL$treatment)
BPRSL$subject <- factor(BPRSL$subject)
# Glimpse the data
glimpse(BPRSL)
## Rows: 360
## Columns: 6
## $ X <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 1…
## $ treatment <fct> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
## $ subject <fct> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 1…
## $ weeks <chr> "week0", "week0", "week0", "week0", "week0", "week0", "week0…
## $ bprs <int> 42, 58, 54, 55, 72, 48, 71, 30, 41, 57, 30, 55, 36, 38, 66, …
## $ week <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
str(BPRSL)
## 'data.frame': 360 obs. of 6 variables:
## $ X : int 1 2 3 4 5 6 7 8 9 10 ...
## $ treatment: Factor w/ 2 levels "1","2": 1 1 1 1 1 1 1 1 1 1 ...
## $ subject : Factor w/ 20 levels "1","2","3","4",..: 1 2 3 4 5 6 7 8 9 10 ...
## $ weeks : chr "week0" "week0" "week0" "week0" ...
## $ bprs : int 42 58 54 55 72 48 71 30 41 57 ...
## $ week : int 0 0 0 0 0 0 0 0 0 0 ...
summary(BPRSL)
## X treatment subject weeks bprs
## Min. : 1.00 1:180 1 : 18 Length:360 Min. :18.00
## 1st Qu.: 90.75 2:180 2 : 18 Class :character 1st Qu.:27.00
## Median :180.50 3 : 18 Mode :character Median :35.00
## Mean :180.50 4 : 18 Mean :37.66
## 3rd Qu.:270.25 5 : 18 3rd Qu.:43.00
## Max. :360.00 6 : 18 Max. :95.00
## (Other):252
## week
## Min. :0
## 1st Qu.:2
## Median :4
## Mean :4
## 3rd Qu.:6
## Max. :8
##
glimpse(BPRSL)
## Rows: 360
## Columns: 6
## $ X <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 1…
## $ treatment <fct> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
## $ subject <fct> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 1…
## $ weeks <chr> "week0", "week0", "week0", "week0", "week0", "week0", "week0…
## $ bprs <int> 42, 58, 54, 55, 72, 48, 71, 30, 41, 57, 30, 55, 36, 38, 66, …
## $ week <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
BPRSL data set has 360 observations and 6 variable. From the glimpse function, we can see the 6 columns, in which two treatment arms are coded 1 and 2 for treatment 1 and treatment 2. Subjects are coded from 1 to 20 however the repetition of same code in subjects suggests that participants were randomized to Treatment 1 or 2.
#Lets plot the data
ggplot(BPRSL, aes(x = week, y = bprs, linetype = subject)) +
geom_line() +
scale_linetype_manual(values = rep(1:10, times=4)) +
facet_grid(. ~ treatment, labeller = label_both) +
theme(legend.position = "none") +
scale_y_continuous(limits = c(min(BPRSL$bprs), max(BPRSL$bprs)))
From the plot, it appears that BPRS seems to decrease during the
treatment period in both treatment arms. A clear diffrerence cannot be
validated however between groups from the plots.
# lets create a regression model for BPRSL
BPRS_reg <- lm(bprs ~ week + treatment, data = BPRSL)
# print summary of the model
summary(BPRS_reg)
##
## Call:
## lm(formula = bprs ~ week + treatment, data = BPRSL)
##
## Residuals:
## Min 1Q Median 3Q Max
## -22.454 -8.965 -3.196 7.002 50.244
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 46.4539 1.3670 33.982 <2e-16 ***
## week -2.2704 0.2524 -8.995 <2e-16 ***
## treatment2 0.5722 1.3034 0.439 0.661
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 12.37 on 357 degrees of freedom
## Multiple R-squared: 0.1851, Adjusted R-squared: 0.1806
## F-statistic: 40.55 on 2 and 357 DF, p-value: < 2.2e-16
We have BPRS score as our target variable and time (weeks) and treatment 1 and 2 as our explanatory variable. From the summary model, week variable seem to be statistically significant with BPRS but treatment variable doesn’t (p = 0.661). No significant difference can be seen in the difference in BPRS based on treatments However this analyses assumes independence of observations, i.e., the observation or outcome is not affected by any other confounder and is completely influenced by the explanatory variable which is not very rational. Therefore we now move on to a more stringent analyses which assumes observations ad dependent variable and can be influence by other effect. We will analyse the data set with both Fixed-effect models and Random-effect models.
#lets load the package
library(lme4)
# Create a random intercept model
BPRS_ref <- lmer(bprs ~ week + treatment + (1 | subject), data = BPRSL, REML = FALSE)
# Print the summary of the model
summary(BPRS_ref)
## Linear mixed model fit by maximum likelihood ['lmerMod']
## Formula: bprs ~ week + treatment + (1 | subject)
## Data: BPRSL
##
## AIC BIC logLik deviance df.resid
## 2748.7 2768.1 -1369.4 2738.7 355
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -3.0481 -0.6749 -0.1361 0.4813 3.4855
##
## Random effects:
## Groups Name Variance Std.Dev.
## subject (Intercept) 47.41 6.885
## Residual 104.21 10.208
## Number of obs: 360, groups: subject, 20
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 46.4539 1.9090 24.334
## week -2.2704 0.2084 -10.896
## treatment2 0.5722 1.0761 0.532
##
## Correlation of Fixed Effects:
## (Intr) week
## week -0.437
## treatment2 -0.282 0.000
confint(BPRS_ref)
## Computing profile confidence intervals ...
## 2.5 % 97.5 %
## .sig01 4.951451 10.019188
## .sigma 9.486731 11.026604
## (Intercept) 42.608796 50.298982
## week -2.679990 -1.860844
## treatment2 -1.542804 2.687248
Lets first reflect on our BPRS data set. Our maxima and minima is 95 and 18 respectively. Given the high variance, the baseline seems to differ from the outcomes.
Let’s fit the random intercept and random slope model to our data:
Fitting a random intercept and random slope model allows the linear regression fits for each individual to differ in intercept but also in slope. This allows us to account for the individual differences in the individuals symptom (BRPS score) profiles, but also the effect of time.
# create a random intercept and random slope model
BPRS_ref1 <- lmer(bprs ~ week * treatment + (week | subject), data = BPRSL, REML = FALSE)
# print a summary of the model
summary(BPRS_ref1)
## Linear mixed model fit by maximum likelihood ['lmerMod']
## Formula: bprs ~ week * treatment + (week | subject)
## Data: BPRSL
##
## AIC BIC logLik deviance df.resid
## 2744.3 2775.4 -1364.1 2728.3 352
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -3.0512 -0.6271 -0.0768 0.5288 3.9260
##
## Random effects:
## Groups Name Variance Std.Dev. Corr
## subject (Intercept) 64.9964 8.0620
## week 0.9687 0.9842 -0.51
## Residual 96.4707 9.8220
## Number of obs: 360, groups: subject, 20
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 47.8856 2.2521 21.262
## week -2.6283 0.3589 -7.323
## treatment2 -2.2911 1.9090 -1.200
## week:treatment2 0.7158 0.4010 1.785
##
## Correlation of Fixed Effects:
## (Intr) week trtmn2
## week -0.650
## treatment2 -0.424 0.469
## wek:trtmnt2 0.356 -0.559 -0.840
confint((BPRS_ref1))
## Computing profile confidence intervals ...
## Warning in FUN(X[[i]], ...): non-monotonic profile for .sig02
## Warning in confint.thpr(pp, level = level, zeta = zeta): bad spline fit
## for .sig02: falling back to linear interpolation
## 2.5 % 97.5 %
## .sig01 5.39069597 12.1613571
## .sig02 -0.82694500 0.1404228
## .sig03 0.43747319 1.6543265
## .sigma 9.10743668 10.6349246
## (Intercept) 43.31885100 52.4522602
## week -3.34923718 -1.9074295
## treatment2 -6.04400897 1.4617868
## week:treatment2 -0.07243289 1.5040996
# perform an ANOVA test on the two models to assess formal differences between them
anova(BPRS_ref1, BPRS_ref)
## Data: BPRSL
## Models:
## BPRS_ref: bprs ~ week + treatment + (1 | subject)
## BPRS_ref1: bprs ~ week * treatment + (week | subject)
## npar AIC BIC logLik deviance Chisq Df Pr(>Chisq)
## BPRS_ref 5 2748.7 2768.1 -1369.4 2738.7
## BPRS_ref1 8 2744.3 2775.4 -1364.1 2728.3 10.443 3 0.01515 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
ANOVA test seems significant (p < 0.05). Addition of slope definitely increases the model fit. Its clear that addition of a random intercept model increased the inter individual variance. Treatment group seems unaffected over time. Lets see the slope model and see the outcomes.
BPRS_ref2 <- lmer(bprs ~ week * treatment + (week | subject), data = BPRSL, REML = FALSE)
# print a summary of the model
summary(BPRS_ref2)
## Linear mixed model fit by maximum likelihood ['lmerMod']
## Formula: bprs ~ week * treatment + (week | subject)
## Data: BPRSL
##
## AIC BIC logLik deviance df.resid
## 2744.3 2775.4 -1364.1 2728.3 352
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -3.0512 -0.6271 -0.0768 0.5288 3.9260
##
## Random effects:
## Groups Name Variance Std.Dev. Corr
## subject (Intercept) 64.9964 8.0620
## week 0.9687 0.9842 -0.51
## Residual 96.4707 9.8220
## Number of obs: 360, groups: subject, 20
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 47.8856 2.2521 21.262
## week -2.6283 0.3589 -7.323
## treatment2 -2.2911 1.9090 -1.200
## week:treatment2 0.7158 0.4010 1.785
##
## Correlation of Fixed Effects:
## (Intr) week trtmn2
## week -0.650
## treatment2 -0.424 0.469
## wek:trtmnt2 0.356 -0.559 -0.840
# perform an ANOVA test on the two models
anova(BPRS_ref2, BPRS_ref1)
## Data: BPRSL
## Models:
## BPRS_ref2: bprs ~ week * treatment + (week | subject)
## BPRS_ref1: bprs ~ week * treatment + (week | subject)
## npar AIC BIC logLik deviance Chisq Df Pr(>Chisq)
## BPRS_ref2 8 2744.3 2775.4 -1364.1 2728.3
## BPRS_ref1 8 2744.3 2775.4 -1364.1 2728.3 0 0
Finally, looking at the model, the two model and outcomes seems similar. Comparing the ANOVA test conforms there isn’t much significant difference between them. Adding the interaction variable as mentioned above in model 1 doesn’t seem to work out as the model didn’t change and the significance level hasn’t changed either.
# draw the plot of BPRSL with the observed BPRS values
ggplot(BPRSL, aes(x = week, y = bprs, group = subject, col= treatment)) +
geom_line(aes(linetype = treatment)) +
scale_x_continuous(name = "Time (weeks)", breaks = seq(0, 4, 8)) +
scale_y_continuous(name = "Observed BPRS") +
theme(legend.position = "right") +
facet_grid(. ~ treatment, labeller=label_both)
# Create a vector of the fitted values
Fitted <- fitted(BPRS_ref)
Fitted1 <- fitted(BPRS_ref1)
Fitted2 <- fitted(BPRS_ref2)
# Create a new column fitted to BPRSL
BPRSL <- BPRSL %>% mutate(bprs_fitted_values_BPRSL_ref = Fitted, bprs_fitted_values_BPRSL_ref1 = Fitted1, bprs_fitted_values_BPRSL_ref2 = Fitted2)
head(BPRSL)
## X treatment subject weeks bprs week bprs_fitted_values_BPRSL_ref
## 1 1 1 1 week0 42 0 53.19460
## 2 2 1 2 week0 58 0 43.04516
## 3 3 1 3 week0 54 0 43.98584
## 4 4 1 4 week0 55 0 49.13483
## 5 5 1 5 week0 72 0 58.19506
## 6 6 1 6 week0 48 0 41.51037
## bprs_fitted_values_BPRSL_ref1 bprs_fitted_values_BPRSL_ref2
## 1 49.24017 49.24017
## 2 46.97380 46.97380
## 3 47.65582 47.65582
## 4 49.85313 49.85313
## 5 66.39001 66.39001
## 6 42.59363 42.59363
# draw the plot of BPRSL with the Fitted values of bprs model 1
ggplot(BPRSL, aes(x = week, y = bprs_fitted_values_BPRSL_ref, group = subject, col=treatment)) +
geom_line(aes(linetype = treatment)) +
scale_x_continuous(name = "Time (weeks)", breaks = seq(0, 4, 8)) +
scale_y_continuous(name = "Fitted BPRS (model 1: rnd intercept)") +
theme(legend.position = "right") +
facet_grid(. ~ treatment, labeller=label_both)
# draw the plot of BPRSL with the Fitted values of bprs model 2
ggplot(BPRSL, aes(x = week, y = bprs_fitted_values_BPRSL_ref1, group = subject, col=treatment)) +
geom_line(aes(linetype = treatment)) +
scale_x_continuous(name = "Time (weeks)", breaks = seq(0, 4, 8)) +
scale_y_continuous(name = "Fitted BPRS (model 2: rnd intercept + slope)") +
theme(legend.position = "right") +
facet_grid(. ~ treatment, labeller=label_both)
From the plot, we can see random intercept model differs from random
intercept and slope model. Adding a slope intercept didn’t change the
outcomes however. We can also see the final plot random intercept and
slope with interaction model also different from all three model above.
In conclusion we can say that random intercept model doesn’t highlight
the individual’s effect on bprs over time and also the outcomes didn’t
change with subsequent model.
******************************************************************* END ******************************************************************
DONE AND DUSTED \(\large \surd\) !!!
Well not really !! There is so much more to learn. This course was however a great “push” I have truly enjoyed the exercise sessions, the course material and the review exercises. I have learned so much and I am looking forward to learning more !!
Merry Christmas and a Happy New Year Everyone !!